Description Background & Context

The Thera bank recently saw a steep decline in the number of users of their credit card, credit cards are a good source of income for banks because of different kinds of fees charged by the banks like annual fees, balance transfer fees, and cash advance fees, late payment fees, foreign transaction fees, and others. Some fees are charged on every user irrespective of usage, while others are charged under specified circumstances.

Customers’ leaving credit cards services would lead bank to loss, so the bank wants to analyze the data of customers’ and identify the customers who will leave their credit card services and reason for same – so that bank could improve upon those areas

You as a Data scientist at Thera bank need to come up with a classification model that will help bank improve their services so that customers do not renounce their credit cards

Objective

Explore and visualize the dataset. Build a classification model to predict if the customer is going to churn or not Optimize the model using appropriate techniques Generate a set of insights and recommendations that will help the bank Data Dictionary:

Best Practices for Notebook :

The notebook should be well-documented, with inline comments explaining the functionality of code and markdown cells containing comments on the observations and insights. The notebook should be run from start to finish in a sequential manner before submission.

Perform an Exploratory Data Analysis on the data - (6 Marks)

Insights

There are 22 columns, all the columns does not have null values as per non-nullcount,
we have integer, float and object data types. 
There is one unnamed:21, column without any values. 
10127 entries in this dataset

Observation

1. Attrition_Flag, Gender, Marital_Status, Income_Category, Card_Category are categorial columns
2. CLIENTNUM is the unique customer id, that can be dropped
3. Unnamed: 21 can be dropped

Data Preprocessing

Bivariate Analysis

Illustrate the insights based on EDA - (5 Marks) Key meaningful observations on the relationship between variables

Data Preparation

Data Pre-processing - (5 Marks)
        Prepare the data for analysis 
        - Missing value Treatment, 
        Outlier Detection(treat, if needed- why or why not Feature Engineering, 
        Prepare data for modeling

Data Preparation

Split the data in to train and test set

Missing Value treatment

Encoding categorial variables

Model building - Logistic Regression - (6 Marks)

- Make a logistic regression model 
- Improve model performance by up and downsampling the data 
- Regularize above models, if required

Model building - Bagging and Boosting - (8 Marks)

- Build Decision tree, 
random forest, bagging classifier models 
- Build Xgboost, AdaBoost, 
and gradient boosting models

Hyperparameter tuning using grid search - (8 Marks)

- Tune all the models using grid search 
- Use pipelines in hyperparameter tuning

Observation

Hyperparameter tuning using random search - (8 Marks)

XG Boost

Grid search cv

Random Search - XGBoost

Model Performances - (5 Marks)

Insights

XGBoost with RandomizedSearchCV, performed well compared to other models. The recall score is good in this model and with ransomized search cv.

Actionable Insights & Recommendations - (5 Marks)

  1. Bank should target the customers has more transaction count in 12 months. As their transaction increases they are more likely to stop the service. Based on the transactions bank can target those customer and provide some money back which will help them to retain the customer.
  2. Total transaction amount and revolving balance affects the churn, bank can focus on those cusomter and provide some offers.
  3. Customer who has more productes (no of products held by the customer) should be targeted and launch new products to retain them and offer them added benefits.
  4. Bank also should focus on inactive customer (not active for months) and provide them offers encourage them to use card.
  5. Customer who has Extended period of relationship with bank most likely to stay, so bank should focus on the cusotmer relationship
  6. Raltionship and transaction count are the most affected features, bank should target those users and provide added benefits to increase the transaction, keep them active and encourage them to buy new products.

Notebook - Overall quality - (4 Marks)